.. _cross_data: Generating a Cross-Sectional Dataset *************************************** .. |ynetto ppfad| raw:: html "ynetto" .. |ynetto ppfad2| raw:: latex \href{https://paneldata.org/soep-core/data/ppfad/ynetto}{\textbf{"ynetto"}} .. |ypop ppfad| raw:: html "ypop" .. |ypop ppfad2| raw:: latex \href{https://paneldata.org/soep-core/data/ppfad/ypop}{\textbf{"ypop"}} .. |yp0101 yp| raw:: html "yp0101" .. |yp0101 yp2| raw:: latex \href{https://paneldata.org/soep-core/data/yp/yp0101}{\textbf{"yp0101"}} .. |emplst08 ypgen| raw:: html "emplst08" .. |emplst08 ypgen2| raw:: latex \href{https://paneldata.org/soep-core/data/ypgen/emplst08}{\textbf{"emplst08"}} .. |hinc08 yhgen| raw:: html "hinc08" .. |hinc08 yhgen2| raw:: latex \href{https://paneldata.org/soep-core/data/yhgen/hinc08}{\textbf{"hinc08"}} .. |yphrf phrf| raw:: html "yphrf" .. |yphrf phrf2| raw:: latex \href{https://paneldata.org/soep-core/data/yp/yp0101}{\textbf{"yphrf"}} .. |pid ppfad| raw:: html "pid" .. |pid ppfad2| raw:: latex \href{https://paneldata.org/soep-core/data/ppfad/pid}{\textbf{"pid"}} .. |cid ppfad| raw:: html "cid" .. |cid ppfad2| raw:: latex \href{https://paneldata.org/soep-core/data/ppfad/cid}{\textbf{"cid"}} .. |hid_2008 ppfad| raw:: html "hid_2008" .. |hid_2008 ppfad2| raw:: latex \href{https://paneldata.org/soep-core/data/ppfad/hid_2008}{\textbf{"hid_2008"}} .. |psample ppfad| raw:: html "psample" .. |psample ppfad2| raw:: latex \href{https://paneldata.org/soep-core/data/ppfad/psample}{\textbf{"psample"}} .. |sex ppfad| raw:: html "sex" .. |sex ppfad2| raw:: latex \href{https://paneldata.org/soep-core/data/ppfad/sex}{\textbf{"sex"}} .. |gebjahr ppfad| raw:: html "gebjahr" .. |gebjahr ppfad2| raw:: latex \href{https://paneldata.org/soep-core/data/ppfad/gebjahr}{\textbf{"gebjahr"}} .. |yp10601 yp| raw:: html "yp10601" .. |yp10601 yp2| raw:: latex \href{https://paneldata.org/soep-core/data/yp/yp10601}{\textbf{"yp10601"}} This example involves generating a dataset to analyze health satisfaction determinants in 2008, and you can either use the Paneldata.org syntax generator or write your own syntax file to perform this task. You can search for the variable names in Paneldata.org (or use the variables below directly). **1. Generate a cross-sectional dataset for the year 2008, which should contain all persons with the following characteristics:** - Respondents in 2008 |ynetto ppfad| |ynetto ppfad2| - Lived in a private household in 2008 |ypop ppfad| |ypop ppfad2| The dataset should contain the following variables of interest. - satisfaction with health |yp0101 yp| |yp0101 yp2| - smoking currently yes/no |yp10601 yp| |yp10601 yp2| - current employment status |emplst08 ypgen| |emplst08 ypgen2| - monthly household net income |hinc08 yhgen| |hinc08 yhgen2| In addition, the dataset should contain the following additional information for a 2008 cross-sectional analysis (these variables are automatically generated by paneldata.org): - current cross-section weighting factor |yphrf phrf| |yphrf phrf2| - personal number |pid ppfad| |pid ppfad2| - original household number |cid ppfad| |cid ppfad2| - current household number |hid_2008 ppfad| |hid_2008 ppfad2| - sample affiliation |psample ppfad| |psample ppfad2| - gender |sex ppfad| |sex ppfad2| - year of birth |gebjahr ppfad| |gebjahr ppfad2| **Create an exercise path with four subfolders:** .. figure:: png/uebungspfade.png :align: center **Example:** - H:/material/exercises/do - H:/material/exercises/output - H:/material/exercises/temp - H:/material/exercises/log These are used to store commands, log files, datasets, and temporary datasets. Open an empty do file and define your created paths with globals: .. literalinclude:: docs/querschnitt_data.do :linenos: :lines: 8-16 The global "AVZ" defines the main path. The main paths are subdivided using the globals "MY_IN_PATH", "MY_DO_FILES", "MY_LOG_OUT", "MY_OUT_DATA", "MY_OUT_TEMP". The global "MY_IN_PATH" contains the path to your data. Use ppath as the source file together with the required variables. Keep all cases with completed interviews. In addition, your dataset should only contain respondents who can make a statement on the content of the question. For example, you can use the net code to identify and remove children from your dataset. .. literalinclude:: docs/querschnitt_data.do :linenos: :lines: 25-43 Save the modified data temporarily. Now link your dataset with the weights of the SOEP and save your dataset as a master file. .. literalinclude:: docs/querschnitt_data.do :linenos: :lines: 46-59 Now prepare the content variables. Search for the content variables you are looking for from the various datasets and temporarily save the datasets you have created. .. literalinclude:: docs/querschnitt_data.do :linenos: :lines: 62-73 Link the datasets you have created to your master file and save for analysis. .. literalinclude:: docs/querschnitt_data.do :linenos: :lines: 76-90 You have successfully created a cross-sectional dataset for the year 2008. **2. Encode missing values into system missings (STATA)!** In SOEP, the missing codes of variables are described in detail with the values -1 to -8. To learn more about missing codes, see the section :ref:`missings`. For content analysis, it is not always necessary to differentiate missing codes. Therefore you should be able to convert missing codes: .. literalinclude:: docs/querschnitt_output.do :linenos: :lines: 13-22 Open the dataset for your analysis and summarize all missing codes. **3. How does average health satisfaction differ** **a) by gender** Satisfaction was measured on a scale of 1 to 10. To compare average satisfaction with health between women and men, you should display the mean value for both genders. .. literalinclude:: docs/querschnitt_output.do :linenos: :lines: 32-33 .. figure:: png/quer_06.png :align: center Since you have previously added the SOEP weighting factors to the dataset for your analysis, you should use the weighting for a representative analysis. .. literalinclude:: docs/querschnitt_output.do :linenos: :lines: 34-35 .. figure:: png/quer_07.png :align: center **b) Employment status** Now proceed in a similar way when comparing satisfaction with health and employment status. Compare the mean values again: .. literalinclude:: docs/querschnitt_output.do :linenos: :lines: 37-39 .. figure:: png/quer_08.png :align: center Since you have previously added the SOEP weighting factors to the dataset for your analysis, you should use the weighting for a representative analysis. .. literalinclude:: docs/querschnitt_output.do :linenos: :lines: 40-41 .. figure:: png/quer_09.png :align: center **c) Age** Since you do not have a variable that represents age, you must generate a suitable age variable using the birth year variable. The year of birth is metric and should be categorized for analysis. Define categories for your age variable and assign suitable labels. .. literalinclude:: docs/querschnitt_output.do :linenos: :lines: 43-49 Create a mean value comparison with your age variable and health satisfaction in weighted and unweighted form. .. literalinclude:: docs/querschnitt_output.do :linenos: :lines: 51-52 .. figure:: png/quer_11.png :align: center .. literalinclude:: docs/querschnitt_output.do :linenos: :lines: 53-54 .. figure:: png/quer_12.png :align: center **d) Income** As with age, generate a categorized version of income for household net income: .. literalinclude:: docs/querschnitt_output.do :linenos: :lines: 56-60 Display the mean values in weighted and unweighted form: .. literalinclude:: docs/querschnitt_output.do :linenos: :lines: 62-63 .. figure:: png/quer_14.png :align: center .. literalinclude:: docs/querschnitt_output.do :linenos: :lines: 64-65 .. figure:: png/quer_15.png :align: center **e) Smoking** Since this variable is nominal, adjustments to this variable are not necessary. Display average satisfaction with health for smokers and non-smokers in weighted and unweighted form: .. literalinclude:: docs/querschnitt_output.do :linenos: :lines: 67-70 .. figure:: png/quer_16.png :align: center .. literalinclude:: docs/querschnitt_output.do :linenos: :lines: 72-73 .. figure:: png/quer_17.png :align: center Last change: |today|